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SYSTEM AND M#HOD FOR MANAGING CONTENT 
INCLUDING ADDR^^^ 

CROSS-REFERENCE TO RELATED APPLICATIONS 

[01] This applieation claims priority to U.S. Provisional Application Serial Number 
60/434,418 entitled "FILE MANAGEMENT SYSTEM AND METHOD" which was 
filed on December 19, 2002, and which is incorporated herein by reference in its entirety. 
This application is also related to corresponding U.S. Patent AppUcation entitled "System: 
and Method for Managing Cohtent,' ' Attorney Docket Number 25396-003; U.S. Patent 
. Application entitled "System and Method for Managing Versions,- ' Attorney Docket 
Nuinber 25396-005- U.S. Patent Application entitled "System and Method for Managing 
Content With Event Driven Actions to Facilitate Workflow and Other Features,'' ; 
Attorney Docket Number 25396^006; and U.S, Patent Application entitled "Graphical 
User interface for System and Method for Managing Content," Attorney Docket Nurnber 
25396r007, filed simultaneously herewith, eacji of which is incorporated herein by 
reference in its entirety; 

FIELD OF THE INVENTION 

[02] The present invention relates to an integrated system and method for managing 
files, messages and other digital content that facilitates categorization of information, 
provides version control, allows event-driven actions including control of workflpw, 
permits sharing and access control of files, is transactionally-based to permit easy 
historical viewing and undoing of a wide variety of changes to files and folders and other 
features, and a graphical user interface to facilitate access to and use of such a system. 
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BACKGROUND OF THE INVENTIQN 

[03] Computers have revolutionized the storage, retrieval and use of information. As 
the costs and size of computer memory has gone down, the amount of information 
accessible to a user has increased substantially. The expansion of networks, including 
global networks, such as the Internet, has also greatly contributed to this growth, This 
growth has greatly outpaced the ability of existing systems to find, share and organize 
that information. 

[04] Originally, electronic file systems were based upon simple filiiig concepts, from 
paper files. Files were organized into folders and subfolders, just like documents in filing 
cabinets. As the number and types of files have grown, the inadequacies of the early 
systems have become increasingly apparent, In the physical environment, as the number 
of filing cabinets increased, indexing systems were developed to locate specific files or . 
documents, Such systems are stilLused in controlling phys^^ In the 

electronic realm, similar file management systems h However, 
networks have changed the nature of file storage, . A user is no longer limited to the files 
on a single computer. Instead, a single user can create, store, access, modify and copy 
files on any number of machines, including their own computer, network servers, and 
even co-workers computers. Additionally, others on a network may be creating, copying, 
and modifying those same files. The exploding use of email has also contributed to 
current problems. Emails are also retained and they need to be organized and controlled, 
so that they can be later located, accessed and used, Within existing computer filing 
systems, disorganization is rampant, and it can be hard to find things. In recent years, 
various disparate applications have emerged to solve some aspects of the problems: 
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Version Control systems, Document Management systems, Workflow systems. 
Configuration Management systems. Archiving systems, Backup systems, general 
purpose databases, etc. These applications are yet other places to store files, in systems 
that haye to be leanied/maintaihed, backed 

[05] One of the many problems with existing electronic filing systems is the creatioii 
of copies. It is very easy to copy a file. There are also important reasons why a copy of a 
file may be better than the original, in terms of accessibility and conveiiience. However, 
the creation of many copies further increases the disorganization of filing systems. 
Studies have shown that most of the files on people's coniputers and disks are copies of 
files fiom other computers on the net\york, from read-only media, and froni their own 
con^uter. ,. [ ■ 

[06] The creation of copies can be very confusing. The original file may be changed, 
or the copy may be changed. Then, they are no longer exact copies, but a user can easily 
lose track of which is the correct one. Many times the creator of a copy forgets about it 
or why it was created. The copy then continues to exist, using valuable storage and name 
space, but without any purpose. The vast majority of copies are not necessary. 
Therefore, a need exists for a file managernent system with improved performance such 
that the need for copies is limited. Furthermore, a need exists for a file management 
system that maintains information about copies of files so that its use and relationship to 
other files can be easily determined, 

[07] Another problem with current file systems is that different users may use different 
approaches to file organization. This leads to difficulties in finding and sharing files. 
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Another problem is the way that access control and sharing are managed. The sharing 
and access control features in the Windows^'^ operating system, for example, are very 
difficult for the average user to make sense of, to use and to maintain. An advanced user 
is typically needed to establish and maintain file sharing groups and related mechanisms. 
Improper sharing and access control may allow access to information that should not be 
disclosed, or files may be inaccessible that 3hould be shared. Therefore, a need exists for . 
a file management system that allows sirnple control of access control and file sharing. 

[08] Locating a desired file is another complicated process in existing systems. Each 
computer or disk drive is qften searched separately, even though infonnation may be 
stored on several different, interconnected, cornputers. Even if a search looks for a file 
on multiple computers, the search results can be misleading or incomplete. The problems 
with copies rnay mean that a search may produce rnany duplicate results and results that 
do not include the best version. The system provid[es little, if any, assistance in 
determining which is the proper (e.g. current) file. Therefore, a need exists for a file 
managernent system that allows searching on multiple computers and organizes results in 
a useful maimer. 

[09] It is well known that it is advisable to inaintain backup copies of files in case of 
corruption, loss, or other problems. However, there are numerous problems with backup 
systems. Often, backup systems are not installed or operated on a regular basis. 
Sometimes, backups do not succeed when scheduled. Very often, only essential servers 
are backed up; the files on individual computers typically are not regularly backed up. 
Additionally, locating aijd retrieving a backup file can be difficult. Therefore, a need 
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exists for a file management system that simplifies the backup and restoration processes. 
Other drawback? exist. 

SUMMARY OF THE INVENTION 

[010] An object to the invention is to overcome these and other drawbacks. The present 
invention substantially overcomes the deficiencies of the prior art through a noyel file 
managernent system. According to one aspect of the invention, the file managemeiit 
system includes an object oriented file management database. The file management 
system includes a volume manager and a coherency mjanager. The volume manager 
manages a set of vplurnes. Each volume may include folders, files and other digital 
content, and it may reference other volumes. The coherency manager, among other 
things, facilUates consistency among rnultiple volume managers, According to another 
aspect of the invention, a novel user interface for interacting with the file management 
system is provided. 

[Oil] Unlike conventional file management systems, the file managernent system of the 
present invention is content addressable and self-prga:niziiig to facilitate categorization of 
information, includes a publish/subscribe capability and event^driven actions to facilitate 
sharing and access control of files and workflo\y, isi transactionally-rbased to facilitate the 
ability to enable a historical view showing actions performed on that file or folder and 
restoring files and folder to states prior to a change. As detailed below, these and other 
aspects of the invention enable a number of advantageous features. 

[012] According to one embodiment, implementation of the content addressability 
feature includes the use of tags. Tags are name-value pairs that describe folder or file 
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attributes. Tags can have a single value or, in some cases, multiple values. According to 
one aspect of the invention, some tags niiay be system generated tags and others may be 
user selected tags. Via the user interface, for example, by right clicking on a file or folder 
and selecting tags from a menu, a user can open a Window showing the item's tag 
information and can view and/or change tag information. 

[013] According to another aspect of the invention, each volume can include one or 
more folders A folder niay be configured to be a view of the database and include 
pQinters to the files associated with that view. 'This enables the contents of a folder to be 
constructed and mmntained dyngunicall^^ According to another aspect of the invention, 
various folder t>^es may be used. By way of example, the folder types may include one 
or more of a query folder, a search folder, a merge folder, a magnetic folder, a typed 
folder and other types of folders. 

[014] A query folder is a folder that generates a query {e.g,^ based on the folder name or 
based oii a tag attached to the folder, or otherwise) into the file management database. A 
query folder encapsulates a set of search criteria and includes real-time-updated results of 
the search. If a file is later changed so that it matches the query, it will be added to the 
corresponding query folder. Sirnilarly, if a file is later changed so that it no longer 
matches the query, it will be removed- The search can be a fuU-text^search across one or 
more volumes, or it can be a tag search, where the query searches tags that have certain 
values. Other search techniques may also be used. Matching objects are then associated 
with that query folder. 
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[015] A search folder is a folder that has a$sociated with it search criteria for searching 
contents of files or other digital objects. Matching objects are then associated with that 
search folde;r. According to one aspect of the invention the volume manager supports 
integration with firee-text search software: When any application changes the contents of 
a file (or folder), the normal sequence is for the file to be opened, written to, and then 
closed. The volume manager processes each of these requests. When it determines that a 
file has changed, a sequence of actions is processed. One of these actions can include 
queuing the file to a search engine for indexing. In a similar way, immediately after a file 
is erased, a request to remove the file froni the index is queued to the search engine. 

[016] According to one embodiment, the system recognizes folders with specially 
formed names, or with special tags, as being search folders or query folders. When such ^ 
a folder is recognized, a search string is extracted from the folder i^anfie or from specific ' 
tags, and passed to a search engine. The results of the search are shown as familiar files- 
in-folders. If the search query is presented in the form of a folder name or a tag va,lue, it. 
is persistent, The search strings can indude complex se^^^ 

boolean operations. When a file is created or is' changed so that it matches an active 
search folder, the narne of the file will appear in that folder without any additional 
intervention by the user. Files can also be specially marked to prevent indexing, Other 
aspects of searching are facilitated by the invention. ^ 

[01 7] A merge folder is a folder (or overlay) that combines two or more folders (e.g., 
using boolean logic or otherwise). A merge folder can include items from a 'merge list' 
of other folders. An item in a folder in the nierge list hides a like-named iteni in a folder 
farther down in the merge list. According to one embodiment, the nierge is real-time, not 
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a snapshot. As items appear and disappear in the merged folders, they appear and 
disappear in the jnerge folder contents. A merge folder can be configin-ed to allow 
creation of new items in the first folder in the merge list, and it can be configured to 
allow the system tp delete items from where they reside or merely to hide them from . 
appearing in the merge folder. Items from the source folders can appem- in the merge 
folder as sync links, Preferably, the system uses a combination of query folders and 
merge folders to iinplement o : ' 

[018] A magnetic folder "attracts'- files with certain tag values. For example, magnetic 
fp Jders disable automatic removal if a file ever matches a query or other criteria. 

[019] Typed folders are folders that include files or other content that have certain . 
characteristics. For example, a typed folder can limit what types of files can be located in 
the folder (e.g., only PDF files), it can prevent certain types of files from being located in 
the folder and can require certain content. For exarnple, a 'Group Role- folder can be 
allowed to include only -User' files and 'Group Access' folders. 

[020] According to another aspect of the invention, changes to folders and files are 
handled on a transactional basis. This enables the system to retain information regarding 
the creation, modification, and uses of a file or its attributes, maintains information 
regarding relationships between files, controls access to files based upon the stored 
information and provides other advantages. This aspect of the invention facilitates an 
item history feature. Each time an item is copied, moved, deleted, saved, renamed, etc., 
the volume manager keeps a record of one or more of what was done, by whom, when, 
why and other desired information. This information may be seen by choosing an item 
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(e.g., by right-clicking the item from the user interface) and selecting "Shpw History." In 
some embodiments, this brings Up a window that shows one or more of where this item 
was copied from and to, who did it, when, why and other desired information. The Item 
History for a folder can also include a list of items that used to be in the folder but which ^ 
were either deleted or moved from the folder. The user can open and explore these items 
if desired (they will be frozen as discussed belo\y). These items can be selected by 
selecting 'Undelete- or 'Bring back' ^rpm a menu. 

[021] An 'undo' option lets a user undo Other previous commands. When a user right 
clicks on a file pi: folder and selects the 'Undo../ menu item, this brings up a dialog box 
that describes a list of things done to the item and the opition to undo one or nibre of 
thein. The undo feature applies to whole folder hierarchies as well as to individual or 
collections of files. Other changes to files and folders can be viewed and undone in 
accordance with the present invention. - 

[022 J The system further permits a user to select a 'Show versions' menu item. This 
displays all extant past versions, which are all frozen. The user can drag these versions tp 
somewhere, open them, compare them with other versipns, or perform other file 
operations. They are just files and folders (except they're frozen). To rnake a previous 
version become the latest, most current version again, the user can right click on an old 
version and select the 'Make Curresnt' command. The item will then be reinstated as the 
current version. 
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[023] These features facilitate simple tasks like undeleting a file but also provide a 
broader range of novel features including the ability to undo a renaming of a file or folder 
and other changes made to the file or folder, 

[024] Another feature accessible from the user interface is the ability to fi:eeze files or 
folders. When a file is frozen, both the contents of the file and the tags attached to it are 
made.permanently read-only, A file or a folder and all of its contents (recursively) can be 
frozen. When this occurs, iio one, not even a super-user or administrator can make it 
modifiable. Yet it can still be read. When an item is frozen, the user can be assured that 
the item is truly a snapshot taken when it says it was taken and that everything in it is as it 
was, nothing added, nothing changed, nothing removed. 

[025] According to one embodiment, every file has an inspectable cryptographically- 
strong hash code (using the SHA- I algorithni, for example). The user interface'pennits 
verification so that this hash code can be used to verify that the content really is intact, 
^d that no error or hacking has changed the content. The hash code may also be used 

for digital signatures. . ' ' 

." • ' ■ " ■ -A • - * ' . ■ 

[026] Another aspect of the invention relates to versioning and saving. The system 
permits saving a file froni an unmodified appUcation, or a user can choose the 'Save as 
Version' menu item, The 'Save as Version' command takes a snapshot of an item by 
making a copy of it, freezing the copy so it will never chang^, and associating it with 
other past versions of the item. A user can access any past version and copy it, link to it, 
or move it, but it can't be modified, since it will be frozen. When a snapshot is 
performed, the volume manager also records who, when, and optionally, why (if a user 
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chooses to supply a comment or have the system do so automatically). Taking a snapshot 
of a folder is similar except that the volume manager saves a frozen copy of everything 
under the folder. 

[027] Another aspect of the invention relates to event driven actions including triggers 
and constraints. Anything done to a file or a folder can be an event that can trigger an 
action. A constraint can be a required event or condition that must occur or exist before a 
certain action can occw. For example, it can prevent ia file from being published before 
certain approvals are obtained. Numerou$ other uses exist for triggers and constraints. 
To use this feature, a user can select fi:om many pre-programmed actions and customizes 
them with drag and drop and form- fill-in. In some enibodiments, actions can be 
programmed by the user. The combined result of all programmed actions enables the 
system to react in real time. As an example, the systern uses event-driven actions to 
notify the right people when a work product file is ready for them to review or to use in 
some other part of a project. Using event-driven actions, a user can build complex 
workflow autoination into folders and fi^^^ 

[028] Another feature of the user interface is the ability Xo easily manipulate lists. 
According to this aspect of the invention, in list view, a user can sort by column as usual, 
but in addition, can configure any column to show the contents in 'niy order'. When the 
folder display is in this mode, a user can rearrange the order of folder items using drag 
and drop techniques. The folder subsequently remembers the user's ordering. 

[029] Various aspects of the volume manager and coherency manager facilitate various 
other aspects of the invention. One such as;pect of the invention relates to smart copies. 

w 



Attorney Docket No, 25396-004 

The volume manager eliminates many scenarios that would have necessitated making 
copies. The priniary scenario where a true copy is useful is where a user wants to modify 
one copy in one way and another copy in another way. For these and other reasons, the 
smart copy feature of the volume manager encompasses several enhancements over 
traditional file copies. According to one embodiment of this aspect of the invention the 
system pemiits live copies, deferred copies and oth^r provides other copy-related 
benefits, . • 

[030] ^ According this aspect of the invention, when the system makes a live copy of a 
fiie named A to a file named B it makes both A and B refer to. the same underlying file. 
If a tiser modifieis file A, file B reflects the chaiige immediately. Deleting file A or B has 
no effect on the other file. If a new version of one file is made, then the other filename '■ 
will refer to that iiew version. The coherency nianager permits live copies to be on 
different volumes. Live copies can refer to folders as well as files. 

[031] The live copy feature facilitates organization of data, in part, because it lets a user 
put the same file or folder inside more than one folder. For example, a photo can be in 
both the Yosemite folder and the Jane folder. In reality, the' folders each include a 
reference to the same physical file. So if the photo is changed, the change will be ' . 
reflected in the "copy" in each folder. 

[032] Another aspect of the invention relates to def^ When the system 

makes a "regular" copy of an original file named A to a copy named B, the volume 
manager knows that the nmies refer to copies of the same file. This uses only a small 
amount of additional disk space. Initially both the original item and the "copy" share the 

■ 12 
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same data. However, at the time that a user modifies either the file called A or the oiie 
called B, the volume manager will make a copy of the single underlying file, and each of 
the two names will refer to its own separate data. This applies to files, folders and other 
items, hi the case of folders, only when files are modified in one or the other copy does . 
the volume manager actually need to allocate space for the new, modified copy. 

[033] After copying file A to a new file B, very Uttle additional disk space is needed 
because of the deferred copy feature. File A will remember that it was copied to file B, 
and file B will remember that it was copied from file A. This information can be seen.in 
the user interface ?uid it can be used to navigate from one copy to another. File A and file 
B $hare the same list of previous versions. If we modify A and then also modify B, the 
current versions will differ, but both still share all of the same previous versions. 
Normally, when a file is copied, the copy is associated with the same current version and 
all the same previous versions. But if desired, a user can copy a past version of A to a 
new file C, and then modify C. Novv A and C differ, but the ancestry they share is the 
saine up to the point where the copy was made; 

[034] Another aspect of the invention relates to smart links. \Vindows has shortcut 
files. Mac OS has alias files. Unix has symbolic links and hard links. The invention 
supports these features and more. A link is a reference to whatever is at the end of the • 
given path. The path can be relative, absolute, or it can be a URL. With adequate 
permissions, a user can make the link "sticky." A sticky link gets to dictate attributes of 
what it points to: the file type (such as a PDF file), whether there has to always be 
somethirig there at the end of the path, and whether the link will adjust to point to the new 
location if th^ reference moves. A link can be configured to behave like a Mac OS alias, 
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Windows shortcut, or Unix symbolic link or hard liiik, appropriate to the platform from 
which it is accessed. A link can also be configured to keep a cached copy of whatever 
was there the last time the link was used. The link might include a cached copy of a 
remote web page or a folder on a remote web site, for example. 

[035] Another aspect of the invention relates t^ When a user 

accesses volume A on server X from client machine Y, the volume manager on machine 
Y creates an entry for yoliune A in its local disk cache. From then on, even if the user 
disconnects from server X, he can still work on volume A frpm their client machine Y, 
using whatever is cached locally. Preferably, the user can request that certain files from 
volume A will always be cached on their client machine, in ease they disconnect or in 
case the server goes down. To do this, the user can select an itein on volurne A, right 
click, and then select the 'Keep local' rnenu item froin a pop-up menu. If the user sets 
'Keep local' on a folder, all of that folder's contents, recursively, are affected. If the user 
also wants to protect against the item being deleted, the systeiri can mfdce a Live Copy. 

[036] The volume manager on client rnachine Y works unobtnisiyely in the background 
to ensure that 'keep local' items remain in syne with the server. If the user discoimects Y 
from the network then reconnects, the volume manager will syrichronize the cache with 
the server. If the user niade any changes in the local cache while disconnected, there may 
be conflicts with changes pri the server, In this case, the user interface will help the user 
reconcile differences. The user interfape's coinpare-^merge tools facilitate this. 

[037] Another aspect of the iiiverition relates to a smart back up feature, The volume 
manager handles backups in an automated way. As files are changed, they are sent over 
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the network to another machine running a copy of the volume manager, which has been 
designated as the 'backup server'. The versioning features make a volume an ideal store 
for backups because it has adequate expressive power to accurately represent the history 
of the backed-up data. Also, the system's transactional characteristics are ideal for 
backup because the backup can be guaranteed to be a consistent snapshot. 

[038] Backups happen continuously^ slowing down only when there's nothing to do or 
to get out of the way while a user is using his computer. Whenever there is idle time, at 
night, at lunch, while a user is on the phone, backups pan go at fiill speed. 

[039] To arrange for backup of a folder, the user right-clicks on the folder and selects 
the "Backup. . ." menu item. The user then designates a folder on another volume where 
he wants there to be a redundant copy of this folder and its versions from now on. 
Features in the user interface \yill assist the user in locating a volume manager oil their 
network that is an appropriate receptacle for their backups. Such a machine would often 
be (but does not have to be) a dedicated, unattended server (called a 'backup drone'), 
shared by multiple users. The user interface Will also help the user identify an 
appropriate place to store their files on the backup machine. For example, there could be 
a specific part of the backup machine' sTolder hierarchy that has been designated for 
backups; Typically, the folder being backed tip will be the root folder of a volume. The 
backup drone will generally be up and connected 24x7, It may have RAID disks, it may 
be a member of a Cluster, and it may in turn back up to another drone off-site. 

[040] Backups are useful for at least two classes of problems: disaster recovery md 
undo. Disaster recovery is easily handled by copying an entire folder or volume from 
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backup as of the most recent backup. Undo allows a user to retrieve deleted items and 
past versions of modified items* As discussed earlier, undo of recent deletions and 
modifications doesn't require backup, since the volume manager keeps recent versions on 
the local disk. Eventually, however, enough old versions may accumulate on the local 
disk that the Vpjume manager will need to delete some of them, counting on a backup 
volume to supply the data if it'? needed. If an undo involves data that has been deleted 
from the local volume, the user interface transparently retrieves the needed data from the 
backup volume. The undo operation is a little slower, but otherwise operates similarly. 

f041J As can be seen, these various features, functioning together, permit great synergy 
and provide unique functionality not heretofore believed to bei known. By way of 
example, the freezing feature is particularly beneficia^l to reliably storing past versions. 
The deferred copies feature makes the folder snapshot feature practical because it 
requires minimal disk space. Another useful versioning feature is the ability to view a ; 
folder hierarchy or an entire volume as of a given time. This ' as of view uses frozen 
items. Various other synergies exist. / 

BRIEF DESCRIPTION OF THE DRAWINGS 

f042J Fig. 1 illustrates complexity in access control associated with a convetntional 
system. 

[043] Fig. 2 illustrates a server system that can utilize a file management syste 
according to an embodiment of the present invention. 

[044] Fig, 3 illustrates various components of a file management system according to an 
embodiment of the present invention. 
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[045] Fig. 4 illustrates communications in a file management system according to an 
embodiment of the present invention^ 

[046] Fig. 5 illustrates a block diagram of a file m^agement system according to, an 
embodiment of the present invention. 

DETAILED DESCRIPTION r 

[047] Fig. 2 illustrates a computer system 100 to which the file management system of 
the present invention can be applied. As illustrated in Fig. 2, the computer system 1 00 . 
includes a server 1 1 0 and a terminal device 1 20. The terminal device 120 may be a 
cpmputer. Alternatively, it may be any other device wM^ 

server in order to access files, such as a PDA, a MP3 player, a cellular phone, a electronic 
gaming system, etc. The server 1 10 includes at least one rnemory volume 111 and at 
lea$t one volume manager 1 12. The terminal device 120 is connected to the server 1 10 
by wired or wireless conununication link 130 in order to access data on the server 110. 
The communication line 130 connects to the volume manager 1 12 in order to access the 
niemory volume 1 1 1 on the server. Alternatively, the terminal device 120 may include 
its own volume manager, 121 for directly accessing the mernory volume 111 on the server 
110. Preferably, the voltune manager 1 12 is a software application operating on the CPU 
of the server which provides fxmctionality as discussed below. Alternatively, the volume 
manager 112 may be implemented in hardware or operate on a machine separate from 
that having the niemory. 

[048] Fig. 3 illustrates components of a software appHcation providing the functionality 
of the file management system according to an embodiment of the present invention. The 
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file management system includes a user interface 210, a volume manager 220 and a 
coherency manager module. Other software modules may be used and functionality 
described herein as being performed by one module may in some cases be perfomed in 
whole or in part by another module. The various software modules may be installed on 
each computer or other device which utilizes the file management system of the present 
invention and on one or more servers or cenfral computers. These software modules may 
operate in conjunction with existing software on those machines, hi particular, the user 
interface 210 and the volume maiiager 220 fuiiction in connection with the existing file 
system on the computer, for example, a Window^ file system 251, The user interface 210 
includes at least one of two alternative components: a set of plug-in extensions 21 1 to 
Windows Exp loi-er 250 (or other such application) and a separate user interface 
application 212. The plug-in extensions 211 allow users to jaccess the functionality of the 
novel file management systein utili;?;ing famiU^r formats and displays (e.g., within a 
Windows Explorer or other environment). The user interface application 212 provides an 
alternative interface and may include additional functionality. Also, the user interface 
application can. be used for devices which do not include Windows Explorer. 

[049] In one embodiment, a volume is a unit of fije storage typically associated with a 
disk partition, or with a Windows 'drive letter'. This ernbpdiment utilizes specific 
memory volumes created for use with the file management system. In some 
embodiments of the invention, a memory volume 111 within the present invention can be 
a physical volume, residing on a disk partition initialized for use with the file 
management system. In other embodiments, memory volunie 111 may be a virtual 
volume whose data is stored inside a hidden folder on an existing OS vplume, such as 
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, NTFS 252 in a Windows file system 251. The volume manager 221 manages the 
contents of one or more memory volumes 1 1 L 

[050] The volume manager 22 1 may be enabled for network access, A proprietary 
protocol is used to communicate with the volume manager 221. Fig, 4 illustrates the 
components of a file management system enabled for network access. A TCP/IP 
connection is used to communicate with the various components operating on the 
memory, The volume manajger 221 connects to a clieiit over a TCP/IP connection, using 
a unique file protocol. A Windows file protocol 254 may be used to communicate with a 
Windows file sharing application 253 for control of data not within the file management 
system of the present invention. The protocol may be implemented in Extended Markup 
Language (XML), with variations and enhancements that include HTTP, Java Remote 
Method Invocation (RMI) and raw binary streams, The protocol stream may be 
compressed and/or encrypted. A group of servers may be used to replicate the same data 
and appear to users as a single server, to provide high availability and improved 
throughput. 

[051] The volume manager 221 operates on the memory volume I'l 1 to provide certain 
functionality. The user interface 210 allows a user to access the functionality. The 
volume manger 221 is able to provide the functionality through specific control of 
information in the database relating to the memory volume 111 and through 
synchronization and linking processes. The functionality of the volume manager 221 is 
described below. 
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[052] According to one embodiment, the volume manager 221 may create live copies of 
files. A file natned A can be live copied to jei file named B, and then either file A or file B 
can he live copied again to a file named C. The underlying data referenced by the three 
different filenanies is the same. So a change to any one of the files will result in those 
changes being immediately visible through any of the live copies. However, deletion of 
one copy does not delete any other copies. The live copies are associated in the database 
of the volume nianager 221 . 

[053 J According to one embodiment, the live copies can be located in different folders. 
Thus, multiple copies of files can be organized in different maiiners while maintaining 
the same content. Since all files are managed by the volume manager 221, live copies 
also can be located in different Volumes. Additionally, live copies are not limited to files. 
Folders may also be live copies. A folder named X can be live copied to folder named V. 
Thus, folder X and folder Y >vould reference the same underlying data object. This has 
the effect that changes to folder X would irnmediately become visible through folder Y. 
This includes adding new. files to the folder, renaming files included in the folder, or 
deleting files from the folder. 

[054J The volume manager 22 1 saves disk space and gains performance by utilizing 
deferred copies; According to one embodinient, when a "regxilar" copy is made of a file 
or folder, the file or folder's contents are not inunediately duplicated. Only a small 
amount of additional disk space is needed for the information in the database regarding 
the new files or folders. Both copies share the same data. Only after the data in one of 
the files is modified, does the volume manager 22 1 create separate data. The same 
applies to copies of an entire folder hierarchy: only when files are modified in one or th^ 
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other copy does the volume manager 221 actually allocate space for the new, modified 
copy. 

[055] According to one embodiment, the user interface 210 can be used to tell the 
volume manager 221 to &ee?:e a file. Once a file or folder is frozen, no one, not even a 
super-user or administrator, can modify or change the state of that file or folder. Thus, 
frozen files provide a snapshot of the file as of the indicated time. Furthermore^ every 
file, including those that are frozen, has aii inspectable cryptographically-strong hash 
code (using the SHA-1 hash algorithm, for example). The hash code can be used to 
verify that the content really is intact, and that no error or hackery has changed the 
content. The hash code may also be used for digital sig^ 

[056] A file's hash code can also be used to identify identical content. According to one 
embodiment, the yplume manager may identify files with identical content, and link them 
together as deferred copies, thereby allpwing the duplicate disk space to be freed. 

[057] According to one embodiment^ the frozen file feature provides a simple 
mechanism to maintain prior versiqns of files. Utilizing a version save cominand in the 
user interface 210, a deferred copy of the file is created and frozen so it will never 
change. The frozen file is then identified in the database as a past version of the file. A 
past version of a file can be accessed to copy, link to or move it. However, it cannot be 
modified. When a version is saved, the volume manager 22 1 may also store additional 
information about the version, such as when and by whom it was. saved. Also, comments 
about the version can be entered and saved by the volume manager 22 1 . In a similar 
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manner, a folder can also be saved, which preserves a frozen copy of everything in the 
folder. 

[058] Because infonnation about associated files, such as versions, is stored in the 
database, accessing associated files is simple. A "show versions" option can be selected 
in the user interface 210,. Li some embodiments, a window Will then display all extant 
past versions, which are all frozen. Any of th^ prior versions can be moved, opened, 
compared to other versions, or otherwise manipulated without changing the content of the 
version. Since information is stored about the timing of versions of all files, the volume 
manajger 221 can provide a view of a folder hierarchy or an entire volume as of a given 
time. All of the parts of that view are prior frozen ver^^^ ^ 

[059] A similar inforrnation for copies of files may also be maintained. A '"show 
copies" optioii may be selected from the user interface 210. In some embodinients, a 
window will then display a copy pedigree for a particular file. Such a copy pedigree may 
include all predecessor files, all descendant files, or some combination. As with versions, 
any of the copies can be moved, opened, compared to other copies, or otherwise 
manipulated without changing the content of the copy. Since infonnation is stored about 
the timing of copies of all files, the volume manager 221 can provide a view of a folder 
hierarchy or an entire volume as of a given time. This allows users to view the migration 
and evolution of a particular file as well as identify the source of the particular file. 

[060] Every time changes are made to files, the volume manager 221 records what was 
done. When a file is copied, moved, deleted, or saved a record is made. The system can 
then provide a history of any item, which shows where this item was copied from and to. 
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who did it, when, and why. For a folder, the history includes a list of items that used to 
he in the folder but which were either deleted or moved from the folder. From the history 
list, items that have been moved or deleted can be restored, brought back to the folder, or 
copied back to the folder. ^ 

[061J The volurne manager 22 1 also provides Unkiiig capabilities. A link is a reference 
to whatever is at the end of the given path. The path can be relative, absolute, or it can be 
a URL. In some embodiments, a link can be "sticky'" in that it dictates attributes of what 
it points to. For example, the link can include a reference to a file type (such as a PDF 
file), whether there has to always be something there at the end of the path, and whether 
the link will adj ust to point to the new location if the referent moves. A link can be 
, configured to behave like a Mac OS alias, Wiiidows shortcut, or Unix symbolic link or 
hard link, appropriate to the platform from which it is4ccessed. A link can also be 
configured to keep a cached copy of whatever was there the last time the link was used, 
for example, a web page or a folder on a web site. 

f062J The volume manager 22 1 also proyides functionality with respect to folders. One 
type of folder implemented by volume manager 22 1 is a query folder. A query folder can 
be created which encapsulates a set of search criteria and includes realrtime-updated 
results of the search. The search can be a full-text search across one or niore volumes, or 
it can be a tag search. 

f063J Query folders are stored in the volume manager 22 1 like ordinary folders. 
However, their uniquely formatted name or a special tag attribute indicates to the system 
that they are query folders and not regular folders. At the time that a query folder is 
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enumerated, the query is processed, and the selected files are listed as being the content 
of the folder. In addition, when a new file is created, or when one of the tags associated 
with the query folder changes, the query is evaluated again, and an event is deHVered to 
the client to indicate that a file should be added to or removed 6*01x1 the query folder. 

[064] Another type of folder impleniiented by volume manager 221 is a merge folder. A 
merge folder includes items frbm a 'merge list' of other folders. An item in a folder in 
the merge list hides a like-named item in a folder farther down in the merge list. The 
merge is real-time, not a snapshot; as things appear and disappear in the merged folders, 
they appear and disappear in the merge folder contents. A merge folder can be 
configured to allow creation of new iterns in the merge folder so that they reside in the 
first folder in the merge list. A merge folder can also be configured to allow deletion of , 
iterns fi*om ^yhe^e they reside or merely to hide them from appearing in the merge folder. 
Items from the source folders appear in the merge folder as live copies. A combination 
of query folders and merge folders can be used to implenient complex queries. 

f065J Merge folders are also stored in the volume manager 221. The underlying 
"source" folders know about each merge folder they are used by, and are also referenced 
by the merge folder. This allows the system to propagate changes in the source folder to 
the merge folder, The system can also warn the user about a potential conflict before a 
source folder is deleted. The merge folder also includes a list of edits that are applied to 
each of the source folders. If a file is deleted from a merge folder, for example, an edit is 
stored so that after the contents of all referenced source folders are collected, the edit list 
is applied, and the deleted file is removed fi-om the enumeration before the final list is 
passed back to the user interface 210 for display to the user. 
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[066] One aspect of the invention provides version control. A folder can be designated 
as a "Repository." In one embodiment, a repository folder requires that changes be made 
by doing a "drag-update" to the tojp-level repository folder itself - other changes to its 
contents (z.e. , a piece at a time) are nqt allowed. To "check out a copy," a user makes a 
"regular" copy of the repositoiy folder. Because of deferred copies, this operation is very 
fast, Users make whatever changes they need to rnake anywhere within in the copy of 
folder. Then the copied folder is dragged and dropped back to the repository folder. The 
user interface pops up a "check in" >yindov^ tliat asks the user to include a 'note about the 
changes that were made. During the check-in process, the volume manager compares the 
version history of the new files with the versions that are already in the repository. This 
comparison allows it to identify conflicts. The user interface cbmpare-and'-:merge tools 
are used to resolve any conflicts that may have arisen as a result of another user checking 
out the same hierarchy and changing any of the sanie files. 

[067] The file management system of the present invention allows folders, as well as 
files, to have type. The t>Tpe is stored in the database with the appropriate folder 
inforniation. A type can configure a folder to limit what can be in it and to optionally 
require certain contents. For example, a 'Group Rple^ folder is allowed to include only 
'User' files and ' Group Access' folders, as discussed below. 

[068] The listing of items in a folder is greatly enhanced by the file management system 
of the present invention. Any of the additional information stored with respect to files 
can be saved. Furtheraiore^ special orderings of files can be used in displaying a list. 
The items in folders can be sorted by their name, size, modify time md certain other 
information, as in most jSle management systems^ However, the user can also configure 
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the user interface 210 to display tag names and values associatied with the files in a 
folder. When the folder display is in this mode, the tags appear as column headings, and 
the tag values appear iii those coluinns. The files can then be sorted based on those tag . 
values, by clicking on the tag name at the top of the column. This is irnplemented in the 
user interface 210 as an extension to Windows Explorer kiiown as a "Namespace 
Extension." The extension is told the name of the foW^^ It then 

sends a request to the volume manager 22 1 for a list of all of the tags used in that folder, 
.and the value of each tag for every file in the folder. It uses that information to render the 
user interfacb 2 10. as described above. 

[069] The systeni can also display the date and time when an jtern was added to a 
folder, not just when it was created 

[070] When applied on a network, the file mianagernent system is able tp cache files for ! 
improved access while maintaining control. When a server volume is accessed, the 
volume manager 221 on the client creates w\ entry for the server volume in its local disk 
cache. From then on, even if disconnected from the server, the client can change . 
anything that appeaf s to be on the server Volume, using whatever is cached locally. The 
system can also ensure that certain files from the server volume are always^ cached on the 
client, in case the client is disconnected or the server goes down. If a user wished to 
always have an item available, the "keep local" option is selected froni the user interface 
210. For a folder, all of that folder's contents, recursively, are affected when the "keep 
local" option is selected. If a user also wants to protect against the item being deleted, 
they should make a live copy. The client volume nianager and the server volunie 
manager work unobtrusively in the background together with the coherency manager to 
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ensure that 'keep local' items remain in sync with the server. If the client is disconnected 
from the network, the coherency manager will orchestrate synchronization of the volume 
manager with the client cache upon recomiection. If changes have been made in the local 
cache while disconnected, there may be conflicts with changes on the server. In this case, 
the user interface 210 will work with the user to reconcile the differences: This is done 
in part through a set of compare-merge tools that ate. integrated into the user interface 
210. Th^se tools allow the user to yisualize the changes, and to either select the right 
version or merge changes from one file into another. 

[071] Since infonnation about all changes to files and folders is maintained by the 
volume manager 221, undoing actions is fairly simple. The "Undelete" option in the user 
interface 210 first provides a listing of deleted items. While files are still deleted, they 
can't be viewed or modified, ; When the desired file or folder is selected, the undielete 
command from the user interface 210 makes it viewable and modifiable again. , 
Similarly, the same process can be used to reinstate a previous, version of a file from a 
version listing. Also, the various actions taken with respect to a file or folder can be 
viewed and be reversed with the "undo" option. 

[072] Any change to a file or a folder is an event that can trigger another action by the 
jfile management system. Many preTprogrammed actions can be selected and customized 
with drag and drop and form-fiU-in actions. Actions can also be programmed as one 
would in a spreadsheet. Using JavaScript, Java, or Visual Basic. The system can react in 
real time, similar to a recalculation of a spreadsheet when a cell is changed. 
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[073] In some embodiments of the invention, every item in the memory volume has 
tags, A tag is a coupling of a tag type and a tag value. There are many built-in tag types, 
such as. text, user, date, and icon. A tag can be added to an item, perhaps creating a n^w 
tag type in the process, and its value can be modified (except for some built-in "sys;tem" 
tags). ^ . ' 

f074J An email integration package allows email messages to be brought into the 
system to be manipulated as files ip folders and also to be associated with files and 
folders. To determine whether there has been any email discussion about a file, right- 
click on the file and select the "Messages" commsuid. The user interface will then 
provide the email history associated Avith this file. By clicking the "New Message" 
button on the window toolbar, the user may select the people to whom they want this 
message to go (the system knows who's participated in the discussion so far). The user's 
usual email application (such as Microsoft Outlook) opens up with a hew message in it, 
and in the body of the message there is a special URL with a special protocol (such as. 
"itc://") that refers to the file being discussed in the email. 

f075J Because the present invention is a peer-to-peer system, any user of the system 
reading the messages including "itc.7/'MJRLs can navigate easily from the message to the 
referenced file — not a copy, but the identical file in the space shared by the peers. 

[076] In fact, the URL in the message refers to a specific version of the file, the version 

that was current when the email was written. If the URL is opened, the user interface 

} ... ' . 

brings up a Windows Explorer window to the folder that includes the file, selects the file, 

and opens a "choices" window. The choices window offers to show other emails about 
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the file, to show the file as it was when the email was sent, or if the file has been revised 
since then, the system shows the version history and allows a selection between the 
URL - s version and the current version and offers to show a comparison of the two 
versions. 

[077] The system provides access control through use of management folders. In one 
embodiment, every volume has a management folder with two subfoldersij users and tags. 
The file management system grants access to ah item (file or folder) based on who the 
user is and the groups to which the tiser belongs. There are three kinds of typed foldfers 
found in the users subfolder: "group", "volume group", and "group from authentication 
server" (the latter two are subclasses of folder type "group"). These folders can include 
other group folders and special files of type "user" - 

[078] The system may rely on one or more designated outside authorities to authenticate 
users. This authority can be the local computer, a Windows Active Directory $erver, a 
Kerberos server, LDAP, etc. For every authentication source, there is a corresponding 
typed folder of type "volume group." For each user authenticated by that source, there is 
a corresponding user file in the folder. The user file is an XML file that includes 
authentication source information and user details, such as full name, phone numbers, etc. 
For each group maintained by the authentication server, there is a typed folder of type 
"group from authentication server" in which there are live copies of all the users that are 
members of the group. For example, if the system has been configured to use the 
Windows domain Active Directory server called CORPORATE, the users area might 
include these; ; 
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/users/corporate/Ron 
/users/corporate/Jane 

/users/corporate/Fred • . 

/users/corporate/admin/Fred 

[079] The /users/corporate/ folder (which is a ^t^^ 

authenticated users") and eyerj^ing under it includes informatipn that identify the 
CORPORATE Windows domain as their source, The /users/corporate/admin/ folder is a 
typed folder of type "group from authentication server", arid the user file Fred in it js a 
live copy of /u^ers/corporate/Fred (because files represent the same data). A typed folder 
of type "volume group" is a convenient way to establish groups using the user interface. 
These groups are known only to the system, not to the authentication source. They can be 
useful because Ihey allow groups within groups/ 

[080] An authentication group folder is special in how it treiats the user files and group 
folders included in it, and it allows only those types of items in it. Unlike traditional 
systems, the present invention allows a group to include other groups as well as users. 
The live copy feature makes organizing users and groups easy. Each item (folder or file) 
has one or more owners. An owner is a user or group. An owner is allowed to change 
access settings for itself and for other users and groups. 

[081] The system uses event-driven actions extensively, and custom actions can be 
established to do simple but powerfiil things. For example, the system can notify the 
right people when a work product file is ready for review. Using the event-driven actions. 
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complex workflow automation can be easily built into the user's everyday work area, 
folders and files. 

f082J The system tracks various aspects about the usage of files and folders by users. 
Furthermore, it can be customized to ask for more specific information. Typical 
document management systems are liniited because ,they are not able to control the files 
oii users' desktop computers. Users often have to extract files frorn the docurnent 
management system onto their desktop computer (thereby out of reach and put of the 
control of the docunient management system) and then back into the document 
management system at sonie later time. According to one aspect of the invention, files 
never leave the system, 

The present invention eliminates bad copies in a variety of ways. For example, in 
a conventional systern, a user may wish to copy ian item from a server or a CD-ROM to 
the user's local machine. If the user's purpose for making the copy is convenience, the 
invention provides a syiic liiik from the item pn the server to the local volume. If the ' 
user' s purpose is for speed of access, the invention may provide a cached copy on the 
Jpcal volume. If the user's purpose is to protect against the server going dpwn or the item 
being deleted from the server or unavailability of the CD-ROM, the invention may 
provide a live copy of the item on the local volume. If the user's puipose is to have 
access tp the iteni when not on the network, the invention provides the keep local feature. 

10S4] In other examples, the user niay wish to copy an item from the local machine to 
the server or a rempvable disk. If the user's purpose for making the copy is for backup, 
the invention provides automatic backup to the server. If the user's purpose to publish 
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the item for others to access, the invention jprovides a live copy on the server and 
furthermore may provide permissions to control which users have access. If the user's 
purpose is to capture and maintain a version, the invention provides the snapshot feature. 

[085] In other examples, the user may wish to copy an itern from one folder to another 
folder for organizational convenience (/. e., have all related files in one folder). In this 
case, the invention provides live copies or alternatively, special, folders that have links to 
Hhe various items that should be includ 

^[086] In another example, the user may wish to copy items to a zip file or other archive 
format for reasons similar tp those described above. If the user's purpose is tp keep a ' 
snapshot of a current version of the iteniSj the invention provides the freeze or save ; 
features. If the user's purpose is to send these items to another user, the invention 
provides a link to the saved version that then can be forwarded to the other user. If the 
user' s purpose is to send these items in a zip format, the invention provides an • -extract 
as. .." folder feature. 

[087] Fig. 5 illustrates a, block diagram of an embodiment of file management system in 
further detail. As illustrated therein, file management system 500 interfaces with a file 
system interface 502. File system iriterface 502 allows file nianagement system 500 
communicate with other system devices (not illustrated) using various protocols. In one 
embodiment of the present invention an SMB protocol interface box may be used, As is 
known, SMB is a standard protocol used, for example, by Windows to implement file 
sharing. With the SMB protocol interface box, file nianagement system 500 appears like 
a network drive to other system devices. As would be apparent, other interfaces could be 
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used including those that would support different file-access protocols or that would 
allow file management system 500 to appear as a native file system^ 

[088] File system interface 502 provides a standard API that functions to implement 
standard file system calls, {e.g., read/write, open, close, etc.). File system interface 502 
passes system calls that it receives from other system devices to a disk adapter 504, 
(sometimes referred to herein elsewhere as a grpk adapter) that redirects and impienients 
those system calls in accordance with the presem 

[089] lii one embodiment of the present invention, disk adapter 504 implements system 
calls or "requests" siich as those illustrated in request block 506. These requests include; 
"list" which is used to enumerate a folder; "stat" which gets information about a 
particular file such as size, type, etci; "rnkdir" which creates a directory; "delete" which 
deletes a file, a folder, etc.; "open" which Opens or creates a file; and "close" which 
closes a file. These are referred to herein as file system requests. Other reqiiests such as 
"read," "write," "seek," etc., may also be included as would be apparent and M-e referred 
to as file or "blob" requests. In general, the operation and use of these requests by other 
systerri. devices are well known. 

[090] In one embodiment of the present invention, certain requests and in particular, 
read and write requests, are actually diverted inside disk adapter 504 directly to streams 
that exist on aii underlying file system 508. In one embodiment, file system 508 is an 
NTFS-based file system. Other file systems such a FAT file system may be used as 
would be apparent. However, the NTFS files system provides a more robust system with 
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some built-in integrity preserving capabilities than does FAT file systems. Furthermore, 
iSfTFS more readily allows millions of files to be located in a single fojder,. 

f091J When disk adapter 504 detects read or write requests, they are diverted directly to 
file system 508. In one embodiment, these requests do not pass through the remainder of . 
file m^agement system 500, in part, to avoid processing of large data streams, or 
"blobs," by a trarisactional database. However, in other embodiment^, ifor example, in 
those that implement a custom object store, these blobs may pass through the file 
rnanagement systeni 500 in order to provide transactional integrity (/.e. , all transactions 
fially cpmplete or fiilly fail) as will become apparent fi'pm the discussion below. 

f092J One aspect of file management syistem 500 is to manage all of the metadata that 
surrounds that bsJob as opposed to managing the blob itself This metadata may include, 
for exaniple, filename, tags associated with a file, a folder in which the file resides, a time 
of its creation, a time of its last modification, etc. In some embodiments, file 
inanagement system 500 niay also manage blob creation (e.g., openipg a zero length file) 
and deletiori. 

f093J When a request from a file system arrives, disk adapter 504 creates a request 
object that encapsulatejs any components of the request for operation with a transactional 
database. In some embodiments ojf the present invention, this encapsulation allows file 
management system 500 to be.fully asynchronous in that it allows request objects to be 
queued for subsequent completion without tying up system operation. In some 
embodiments, disk adapter 504 creates a different request object for each type of 
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incoming request. In one implementation, each request ("list," "stat," "mkdir/' etc.) 
con-esponds to a subclass of the base class "request 

[094] . For example, a "mkdir" request object would encapsulate all of the p^ameters for 
. the mkdir request including a name of the directory to be created and a user name 
associated with the pierson requesting the creation. The request object is then passed to a 
system call dispatcher 507. Systern call dispatcher 507 passes the request object to a 
thread pool 510 to be executed. Threail pool 510, in turn, wraps each request object or 
each action associated with the request object inside a transaction for use with the 
transactional database, 

[Q95J In one embodiment, thread pool 510 includes a parallel set of objects derived 
from the transaction ^yrapper. These parallel objects are referred to as task objects. They 
are derived from another class of objects referred to as a transaction wrapper object. 
/Thus, system call dispatcher 507 passes the request object to the task object which is then 
handed off to a thread pool to be executed. One aspect of this embodiment is that the 
task objects may sit in a queue while awaiting processing by thread pool 510. As would 
be apparent, thread pool 510 also provides a mechanism by which file management 
system 500 may asynchronously operate, thereby alleviating server overuse and 
providing improved performance by minimizing connections to the underlying object 
store. ' 

f096J Thread pool 510 grabs task objects one at a time and calls a run method 
associated with the task object as would be apparent. This run method within the 
transaction wrapper handles the object store transactions. More particularly, the run 
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method calls a do_transaction method, which is overridden inside these task objects. In 
this way^ each of task objects does not require all of the extema! wrapper code that knows 
how to manage the transactions, the particular task object performs its specific task, 
(e.g., creates the directory by doing the appropriate object manipulations) and then 
returns. So the transaction wrapper creates or starts a transaction, calls its specific 
do^transaction method, and then calls the commit transaction routine. 

[097] When two tasks or threads attempt to modify the same object(s), the transaction 
database will detect it and prevent the transaction from succeeding by throwing an 
exception. The transaction wrapper manages those exceptions, by for example, 
reattempting the transaction some number of times, hi one embodiment, if the 
transaction continues to fail, the exception manager attempts to obtain exclusive access to 
the database thereby blocking out any other transactions while it completes the 
transaction. 

[098] Before discussing each of the task objects in further detail, a volume manager 
object 515 and an object store 520 are described. According to one embodiment of the 
invention, volume manager object 515 manages mtich of the non-persistent data that's 
associated with volume 525, while voluine 525 stores the persistent data. 

[099] When disk adapter 504 is first initialized, it receives a volume name representing 
a volume 525 and is instructed to initiaUze volume 525. Next disk adapter 504 opens 
volume 525 in similar fashion to a convention file system mount command, by calling 
volume manager object 515. During this initiahzation, disk adapter 504 calls a static 
method inside volume manager object 515 to ask for an instance of volume manager 525 
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associated with the volume name. The static method either returns an existing volume 
manager object or creates one and initializes it, If the volume naanager object exists, . it's 
just looked up in a hash table by the volume name and returned. If not, the volume 
manager goes out to the database, establishes a connection to the object store 520 and 
does a lookup to see if a volume object has been stored there, If it has been stored in 
object store 520, then that volume object is read in and stored in the volume nianager. So 
where the volume object has been previously created, mounting comprises either reading 
that volume object or getting a reference to that persistent volume object from the object 
store and storing a reference to that vqlunie object in the volume manager. 

[0100] Iii one embodiment, object store 520 corresponds to an object store. In this 
embodiment, since each object reference is owned by a particular sessipa, it is not 
possible to pass a standard reference to an object fr^ In this 

, embodiment, object store 520 provides a mechanism referred to as a shared object . 
reference that allows access to these persistent objects with references unique to each 
' session. After the volume manager 5 1 5 mounts the volume 525, a reference to the 
volume 525 is stored in a shared object referience in the volume manager 515. 

[0101] When the volume object does not already exist in object store 520, voliime 
manager 515 creates volume object 525, causes it to be initialized, and stores it in object 
store 520. When volume 525 is initialized, a root slot is created along with a root folder 
and a number of folders and tags associated with a tag volume. 

[0102] Volume manager object 515 also manages access to sessions of object store 520. 
In one embodiment, a read/write lock is created and anchored in the volume manager. 
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Any class in file management system 500, for example, transaction wrapper 510, starts a 
transaction by calling a method in the volume manager to begin the transaction. More 
particularly, the volume managesr includes transaction begin and transaction commit 
methods. When the transaction begin is called, the volume manager must acquire a read 
lock before it calls the underlying object store begin transaction method. 

[0103] A read/write lock provides for multiple readers. So while multiple read locks can 
be acquired, only one write lock can be acquired. This lock operates as follows. When a 
write lock acquire is called or issued, it suspends or waits until all read locks have been 
released; Subsequent read lock acquires that arrive after the write lock acquire is called 
are suspended lintil the write lock acquire completes and the write lock release completes. 

[0104] in one embodiment of the invention, a read lock is acquired in the transaction 
begin method and the read lock is released in the transaction commit method, hi this 
way, raultiple threads and multiple sessions are allowed to be active at the same time. 
However, to accommodate instances where a write conflict occurs such as described 
above, retry logic is incorporated into the transaction wrapper. Thus after trying and 
failing to execute a transaction multiple times, the transaction wrapper, calls an exclusive 
begin method inside the volume manager that calls a write lock acquire on the lock object 
that's used for the normal transactions. This h^s the effect of letting all of the normal 
transactions that are in progress complete, at which point in tinie, that session gains 
exclusive access to the database, and it cmi then complete its transaction without fear of 
interference from other sessions. 



Attorney Pocket No. 25396.004 

[0105] As mentioned above, one embodiment of object store 520 may comprise an object 
store. In this embodiment, object store 520 stores Java objects in a persistent store on 
disk using a sophisticated caching and persistence mechanism. Object store 520 allows 
for multiple sessions with each single session having a consistent view of the database. 
As a session begins a transaction, object store 520 creates a snapshot of the database that ' 
remains consistent until the end of that transaction. When the transaction commits, all of 
the objects changed by the transaction are written to the database in an atomic fashion 
using logging mechanisms for reco 

[0106] In one enjbodiment of the invention, the voliirne manager provides in general a 
one-to-one association between threads and sessions. Because each session has a 
consistent view of the database, it cannot damage sonie other session. 

[0107] Most of the task objects discussed aboye include a path riame as an input. One 
function the file management systeni 500 performs is to map conventional path names 
(e.g^. , c;/folder/subfolder/file.doc, etc:) into database objects of various kinds. The 
volume manager 5 15 parses the path name and performs various table lookups to identify 
a node object. The volume manager begins at a root object anchored in the volxmie 
object and "walks" the graph of objects from the root down to the node object. The 
objects^that the volume object is walking through while parsing are illustrated in Fig. 5 as 
file system data structures 530, ■ ' ' ' 

[0108] File system data structures 530 derive from a super class called file system node, 
or FS node, and include a slot object 532, an entry object 534, and an item object 536 that 
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includes a container object 537 and a stream object 538. These objects in file system data 
structure 530 represent files or other data. structures that reside on a physical disk. 

[0109] Slot object 532 manages a name of a file or a folder. Entry object 534 manages 
tags and attributes. Tags are described in detail below. Attributes describe whether the 
file is firozen, read only, etc. Container object 537, which corresponds to folders^ 
manages all of the data structures associated with a folder. Stream object 538, which 
corresponds to blobs, nianages all of the objects or all of the iteiiis or all of the pieces of 
■ data associated with a blob including, fpr example, the name of tjhe blob on the native file 
system. 

[0110] hi one embodiment of the invention, each file or folder corresponds to a triple 
including a slot 532, an entry 534 and an item 536, More particulaLrlyj each file 
corresponds to a triple of a slot, an entry and a stream 538, while each folder corresponds 
to a triple of a slpt, an entry, and a container 537. The objects forming a triple are linked 
together in various ways to achieve some of the aspects of the present invention including 
live copies and deferred copies. 

[0111] Container 537 allows file management system 500 to map path name components 
into slots 532. In some embodiments, container 537 also includes information about 
whether or not deleted files should be shown when the folder is enumerated. In other 
embodiments, container 527 identifies a type of the folder, for example, whether the 
folder is a normal folder, a query folder, or a search folder. Container 537 may also 
include maintenance data that takes a file or folder name and maps it to a slot to facilitate 
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certain types of lookups. Container 537 may also include methods within the container 
class that, for example, enumerate the fo^ 

10112] Stream 538 is relatively simple by comparison to container 537, In one 
embodiment, stream 538 includes a string that identifies the name of the file on the disk 
in file system 508 where the actual blob resides. Stream 538 may also include a hash ID. 
In one embodirnent, this is a cryptographically strong hash of the contents of the file. 
Each time a file is modified, this hash value is recalculated, to allow the tracking of 
identical files according to the invention. 

[0113] Enivy 534 manages any tags that are attached to a file.; Since multiple slots 532 
can refer to the same entry 534, the entry object also includes a fist of all of the slots 532 
referring to that entry 534, This may occur, for instance, with hard links. Entry 534 may 
also include a reference to the underlying item 536, and references to a revision chain 
{e.g. , the previous version to this one and the next version). According to one 
embodiment of the invention, each entry 534 lives somewhere on a revision chain it 
may be the only object on that chain or one of many. In some embodiments, the revision 
chain is linear. In other embodiments, the. revision chain may include branches that may 
allow an entry to reside on any number, of rqyision chains. In further embodiments, a 
similar mechanism may provide for a copy history that records where this entry was 
copied to, where it was copied from, etc. Each entry 534 may also include one or more 
attribute flags including a frozen attribute, a repository attribute, a free text iijdexer 
attribute, and a read only attribute. 
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[0114] Entty 534 also manages a hash table that maps tag names to their corresponding 
data structures as will be described in further detail below. Entry 534 may also include 
methods for manipulating revision Usts, for setting tags, for removing tags, for copying 
tags to another entry, and for updating dynamic folders ' 

[0115] File management system 500 also includes a tag object 54Q, Tags correspond to a 
name/value pair that is associated with either a file or a folder. ; As discussed above, entry 
534 is the priinary object to which tags are attached. Because both files arid folders haVe 
an entry object, they can both have tags. According to the invention, tag look-ups are 
used many different places and for many different reason^ in the system. As a result, 
their implementation required speedy operation. In otder to provide the necessary speed, 
in one embodiment of the invention, all tag names are stored in a large bi-rdirectional hash 
table. In, pther words, the hash table allows the identification of all objects that have a 
particular tag associated with them as well as the identification of all tags associated with 
4 particular object 

[0116] In one embodiment of the invention, a hash table is anchored in the volume object 
525, and is used to look up all tag names. This hash table receives a tag name and returns 
a single name holder object 541. Name holder 541 includes the name of the tag and a set 
of all of the associated value holders 542 for that name. Value holder 542 includes the 
value of the tag. In other words, name holder 541 includes the name of the tag and value 
holder 542 includes the yalue of the tag. In one embodiment of the invention, a single 
name can be associated with many values. • 
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[0117] Tags can be attached to either entry objects 536 or slot objects 532, Tags that are 
attached to an entry object are shared by all slots linked to that eintry. When referenced 
with respect to tags, slots aiid entries together are referred to as taggable objects. Tags 
attached to a slot are visible orily for that slot. File nanies, for example, may be stored as 
slot tags, since they are different for bach slot. File type and file size may be stored as 
entry tags, since they do not change based on the name of the file or the folder in which it 
is lopated. Slot tags are identified by the prefix "slot." For example, "slot.name'' 
. includes the file name. Most other tag names are attached to entry objects. 

[Oil 8J Each value holder 542 includes a value and a reference to a collection of taggable 
objects (entry objects 536 or slot objects 532) that share that same name/value pair. This 
allows file management systern 500, then, to easily and quickly determine which entry or 
slot obj ect is associated with a particular name/value pair by iterating over the set of 
value holders held by the name holder. In addition, this, allows all of the entry or slot 
objects that are associated with a particular tag or any value of a particular tag to be 
determined. 

[0119] Using these data structures, a given tag name may be associated with multiple tag 
values at the same time for each entry. For example, while it is intuitive that a name can 
have one value for one file and a different value for a different file, a single tag name can 
also have multiple values for the same file. 

[0120] To accommodate a reverse process, a hash table is anchored in taggable objects, : 
whose keys are tag names, and whose values are sets of value holder objects for each of 
the values that is referenced by that taggable object. This allows file management 
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system 500 to identify all of the tags that are associated with an entry or slot. More 
particularly, the value holder object has a reference that points back to its corresponding 
name holder. So from a taggable object, all of the value holder objects can be determined 
which provides the values of the tags, and from those, the tag name and other files with 
the same tag name can also be quickly 

f0121J In addition to tags, file inanagement system 500 includes mechanisms for causing 
side effects to normal file system operations. ^ 

triggers. In one erhbodiment of the invention, a trigger 545 is implehiented around 
various requests. The .trigger$ can be invoked before and/or after each of the various . 
requests, for example, to veto the operation, to indicate or record that the request either is 
about to happen or just completed, or to cause various more complex actions to take 
place, such as setting tags or creating new files or perfprming operations over a network. 
Triggers may also be invoked if changes are made to various tags, either globally 
(regardless of the file to which the tag is attached) or locally (only when the tag is 
attached to a specific file), as would be apparent. 

[0122] In one embodiment of the invention, trigger 545 includes a close trigger 546. and 

an email trigger 547. When a file is modified and closed, then close trigger 546 is , / 

invoked. When a file is moved from one folder into another, then email trigger 547 is 

invoked. 

[0123] In one embodiment of the present invention, when close trigger is invoked, it can 
call an external program whose purpose is to determine the MIME type oifthe file. 
Volume manager 515 makes an initial assumption about the type of the file based on its 
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file extension, based on a list that nia;ps an extension string to a human-readable file type, 
and another list that maps an extension to a MIME type. However, if a file's extension is 
not in those lists, the close ttigger will call £ui extemal program that opens the file, reads 
the first few bytes, and, based on a set of rules, determines what the MIME type of the 
file is, 

[0124] The output of the external program is captured £uid stored into two tags in the file 
management system 500 referred to as. system tags. System tags differ fi*om other tags in 
file management system 500 in that.they cannot be directly modijSed by users of file 
management system 500. According to one embodiment of the invention,; system tags 
start with the jkeywords "sys," or "slot.sys for slot tags, Thus, "sys.mime" and 
"sys.type" include the MIME type information - the actual MIME type is included in 
siys.mime and a human readable version of the MIME type is included in sysiype. As 
thus described, these two system tags are determined when the close trigger is invoked. 

[0125] In some embodiments of the invention, when the close trigger is invoked, a 
request is queued for a cryptographic hash to be computed for the file. As this 
comptitation is both CPU and I/O intensive, it is queued for subsequent background 
processing so as to not delay the close operation as would be apparent. In one 
embodiment, a single background thread is used for computing these hashes. 

[0126] In a similar manner, the close trigger may also queue a request to index the file, 
Indexing the file facilitates fi-ee-text search of the contents of that file. In one 
embodiment of the invention, file management system 500 integrates with a third-party 
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free-text search engines referred to as Lucene, though other engines could be used as 
would be apparent. Indexing may also be done by a single background thread. 

[0127] When an email trigger is invoked, an email may be sent to a user based on various 
tags that are attached either to a file (for exaniple, to send an email when the file is 
modified), or that are attached to a particular tag (for example, to send an email when the 
tag is modified). In some embodimeiiLts of the present invention, the contents of the email 
are static. In other embodiments, the contents are fully configurable based on other tags 
that could be read either from the file itself or frotn the tag volume. 

[0128] When the email trigger is invoked, it evaluates various conditions and determines 
whether to send an email. For example, if a file is being dragged into a folder, the email 
trigger may be invoked. The . email trigger would determine the parent folder associated 
with the destination of the file and determine whether the tags on that folder indicate that 
an email should be sent. If so, in one embodiment of the invention, the email trigger 
includes code to connect to an email server (whose IP address is specified in a specific 
tag) and to deliver an email thereto. 

[0129] Different triggers may be called based on different system events, as have been 
described. The liame of the trigger may be specified in a tag. When the file management 
system 500 executes the trigger, it dynamically loads the trigger software, and calls it 
according to a predefined interface. In one embodiment of the invention, the triggers 
may be Java class files; a Java class loading mechanism is used to load the software; and 
a Java interface is used to specify the standard calling conventions. For example, a file 
"file.txt" may have a tag called "trigger tag.my.tag" set to the value "MyTrigger." In this 
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exainple, whenever the tag "my. tag" for "file.txt' ' changes to a new value, file 
management system 500 loads a Java class called "trigger.MyTirigger" and liien uses the 
■Trigger" interface to invoke that code. 

[0130] As mentioned above, the invention provides for placing tags on tags. In one 

einbodiment of the invention, this is implemented using a tag volume where all tags in 

' ., ■ ■ ■■ . ' 

file management system 500 are reflected as folders. In this embodiment, the tag volume 

itself corresponds to /volume root/tags/ and tags iii file management system 500 descend 

from this folder. For example, ijf you have a tag referred to as "sys.tag," within the tag 

volume, it would be reflected in the filesystem as a folder called /volume 

root/tags/sys/tag. According to one aspect of the invention, "dots" in the tag name are 

replaced with "slashes" and appended onto a prefix for the tag volume. Each time a new 

tag is created, a corresponding folder under that prefix is also created. , ' 

■ [0131] However, deleting a tag from a file, even if it's the last occurrence of that tag 
anywhere in the system, does not remove the corresponding folder from the tag volume. 
This allows users to construct a tag naming convention hierarchy (taxonomy) regardless 
of whether those tags are used. The notion of applying a tag on a tag, sometimes referred 
to as nieta-tagging, is implemented within this tag folder hierarchy. As discussed above, 
tags on tags or "metatags" may used to describe vm*ious attributes about a tag. In one 
embodiment of the invention, metatags are applied to the sys.file tag by using the 
previously described mechanisms to apply tags to the folder that corresponds to the tag in 
the tag volume. For example, to apply the "tag.type" metatag to the tag called "sys.tag," 
the folder /volume root/tags/sys/tag would be located or created and the "tag.type" tag 
would be applied to that folder. 
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[0132] Another aspect of the tag volume is that when a folder is deleted from the tag 
volume, the corresponding tag will be deleted from every file with which that tag is 
associated. A similar mechanism may be used to renarne 

[0133] In some embodiments of the invention, attached to the tag nodes in the tag 
volume is a list in the form of a multi- valued tag, This list includes all of the values that 
are associated with that multi- valued tag, as well as markers (in the form of otjhier 
metatags) indicating whether or not additional values are allowed. . 

[0134] File management system 500 includes a stream transaction block 550 that 
includes a hash transaction object 551 and an index transaction object 552. These objects 
include requests that are placed on the hash apd index queues^ respectively, that were 
described above. These objects and their corresponding queues are persistent to maintain 
consistency of files and file modifications and to, facilitate recovery from server crashes.; 

[0135] In one embodiment of the invention, requests are added onto a queue by one 
session and pulled from the queue by another session. But as described above, each . 
session has a unique and consistent vi<ew of the object store, Thus, one session viewing 
the queue within the context of an object store transaction does not see another session 
updating the queue. Once initiated, then, the hash transaction and index transaction 
objects would not see new requests entering the queue. In some conventional systems, 
these objects would periodically abort their session thereby updating their view of the 
object store, in order to see if new requests have arrived. This is a very inefficient 
solution. 



48 



Attorney Docket No. 25396-004 

[0136] According to one aspect of the invention, this problem is overcome by using a 
parallel non-persistent seniaphore to manage these objects and their respective queues. 
When volume 525 is mounted as described above, volume 525 determines a number of 
objects within each queue, For each queue, volume 525 releases a corresponding number 
of semaphores. As threads may only acquire as many semaphores as have been released, 
when a thread attempts to acquire a semaphore object and none are available, the thread 
waits until soine other thread releases the corresponding semaphore. 

[0137] When, for example, a hash transaction thread begins, it first attempts to acquire a 
semaphore object. If the thread iacquires one, it knows that there must be a coiresponding 
object in the persistent queue. The thread may then join an object store session and start 
an object store transaction. The thread then safely pulls an object off the queue and 
begins processing it. . 

[0138] Correspondingly, after a new object is placed onto the queue and the 
corresponding transaction is successfully cprnpleted, the thread that placed the object 
onto the queue releases the corresponding sema^^^ 

[0139] The semaphore mechanism thus described is important because typically, object 
store 520 does not allow one session to synchronize on objects used by aiiother session 
for this kind of "thread-to-thread" syiichronizatiori. If fact, some object stores throw an 
exception when that occurs in order to facilitate each session's unique and consistent 
view of the database. 

[0140] Once an object is pulled from the queue, hash transaction object 551 reads the 
corresponding file and passes the data to a routine that computes a hash code; In one 



49 



Attorney Docket No, 25396r004 

embodiment of the invention, this hash code is a SHA.- l hash code implementecj in Java 
as is known, 

[0141] According to one aspect of the inventipn, once determined, the resulting 160-bit 
hash code is encoded into a relatively human^readable character string. In one 
embodiment, the hash code is encoded into a 35-character string. In this embodiment, 
every fiye bits of the 160-bit hash code encoded as an ASCII character. The five bits 
correspond to a 32 values from the ASCII character set, namely: 
{0,l,2,3,4,5,6,7,8,9,a,b,c,d,e,f,gXij,M^^ As noted, four of the 

tradition£^l characters from the alphabet were excluded: 1) V' because its pronunciation 
has multiple syllables and thus takes lohger to say; 2) ' o' because it i$ often confused 
with zero; 3) 'm' because it is confused with 'n'; and 4) "1' because it is often confused 
with one. This encoding results in a readily readable string for customer support 
purposes, for ex?urnple; 

The encoded string is stored into a tag whose nmie is passed as parameters to the 
hash transaction object. In one ernbodiment, this tag is referred to as "sys hash.sha-l" 
and a request to recompute the hash code iis queued whenever a file is modified. 

[0143] Index transaction object 552 pulls an object from it$ queue and constructs a 
request for an external indexing program 555 to index the corresponding file. In one 
embodiment, this external indexing program is a third-party software package referred to 
as Lucene. Other mdexing programs are available and could be used as would be 
apparent. The external indexing program receives the contents of the file and some 
metadata such as the date the file was modified, for example. In one embodirnent of the 
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invention, indexing is performed for only tvy^o types of files: text files and HTML files; 
These files are comprised of a stream of words readily processed by the external indexing 
prograrn. In other embodiments of the invention, a prefilter first converts binary files 
(such as, for example, PDF files. Word files, etc.) into a stream of words and then passes 
the stream onto the external indexing program. In other enibodiments of the invention, 
the external indexing program processes binary files directly as would be apparent. 

[0144] The external indexing program uses a front-end^ filter 557, referred to sometimes 
as a Grok analyzer 557, that performs various pre-processing steps on the stream of 
words generated from the file being indexed. These steps may include tokenizing the 
strearn (determining where the breaks between words are), removing "'s" (apostrophe-s) 
fi-om the end of words, removing periods fr-pm acronyms, converting words to lower case, 
renioving common "stop" words (such as "a," "the," "and," "or," etc.) and performing \ 
standard Porter stem filtering (removing common suffixes such as "-ing," "^ed," etc., and 
mapping double suffixes to single ones "-ize" plus "-ation" maps to "-ize") etc. 

[0145] In one embodiment, the resulting text index files from the external indexing 
program are stored out in a file system 558 (or files system 5Q8 as would be apparent). 
Accordingly, in this embodiment, these text index files are not transactionally secure. In 
other embodiments, the resulting text index files are stored in object store 520 as would 
be apparent. 

[0146] File management system 500 also includes a socket manager 580 that is 
responsible for managing incoming connections used as pathways to execute other 
remote commands including XML commands and RMI commands. This mechanism 
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provides a parallel or alternate command path to file management system 500 similar to 
that described as system operations through file system interface 502. Socket manager 
580 is to handle XML commands. When a chent attempts to connect to the server on a 
specific port, socket manager 580 receives that connection. Socket meager 580 
manages the nurnber of connections, preates socket reader object 571 and socket writer 
object 572, and delegates subsequent read and write operations to the corresponding 
object. ]n oiie embodiinent, these sockets are fiiU duplex, thereby enabling parallel 
reading and writing as would be apparent 

[0147] Socket reader object 571 rea^s the socket, packages each XMJj coniinand packet, 
attaches it to an object, and places that object onto a queue. Socket Avriter object 572, on 
the other hand, reads a queue, serializes those objects from the queue, iand outputs them 
to the output socket. 

[0148J Socket worker object 565, which run in their own separate thread pools, pull 
requests off of the corresponding input queue, parses the corresponding XML command, 
determine a necessary action and in some instances, actually executes many of the tasks 
associated with these particular commands. More complex commands may be dispatched 
to appropriate objects that know how to perform those 

[0149] For example, in one embodiment of the invention, commands to manipulate tags 
(z\e., getting tags, setting tags, removing tags, etc.) may enter file managenient system 
500 as XML commands via socket worker 565. After parsing the XML command, socket 
worker performs path name lookups, etc., that may be required to obtain either a slot or 
an entry object and or to set/remove tags, set/read/remoye attributes, etc. . 
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[0150] Socket worker 565 is also responsible for constructing an appropriate response to 
the client for the requested operation. For example, if ttie incoming request asked for all 
of the tags associated with a particular file, socket worker 565 would first access volume 
manager 515 parse the path name associated with the particular file into a slot object. 
Then, using the slot object, socket worker 565 accesses the corresponding entry object. 
The entry object includes methods that, for example, deterniine which tags are associated 
with that entry object. Using that data, socket worker 565 constructs an XML DOM 
object, which represents the response. Once, constructed, socket worker 565 queues the 
POM object up to the corresponding socket writer 572 associated with the client that 
issued the original request/ 

[0151] In one embodiment, the requests are tagged with IP iiumbers tliereby allowing 
file management systern 500 to Operate completely asynchronously. This allows a client 
to submit many requests, one right after the other, without waiting for the responses to 
come back. - Those requests are then queued and subsequently processed by a pool of ' 
socket workers, As the requests are completed (and not necessarily in the order in which 
they were received) and responses are constructed and placed on the output queue, socket 
writer 572 sends theni out with the same ED marker associated with the original request. 
The client can then correlate the responses with the requests. 

[0152] File inanagement system 500 also includes a notification object 560. At various 
points within the operation of fjle management system 500, such as when a new file or 
folder is added or when tags change in certain ways, certain events can be generated. 
According to one aspect of the invention, these events may generate XML messages that 
are sent to a client, in some instances, completely asynchronously. In order for the client 
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to indicate its readiness to receive these events, the client sends a specific command 
referred to as a watch list command. The cUent collects the names of folders referred to 
by open, windows on the client and forwards that as a watch list to the server. In this 
way, the server'now knows which folders every user has open on every connection on 
every desktop. Whenever a new file is created, file management system 500 searches the 
watch lists of open folders to determine if any clients ciirrently have a folder open that 
includes the newly created file. If so, then a corresponding event is sent asynchronously 
to all of those.clients. According to various aspects of the invention, this mechanism 
works similarly for regular folders^ search folders, and/or query folders. A similar 
mechanism also works for tags where if a tag is changed on a file that is currently open 
oh a user's desktop, then that user will receive an asynchronous event saying that that tag 
has been updated > . • 

[0153] Events may be scheduled to occur when, for exainple, a tag or file is deleted from 
any one of these open folders, a file is renamed, etc. Various objects iii file management 
system 500 track which socket writer 572 or socket reader 571 corresponds to which 
user. In other words, within file management system 500 there exists a so-called "back 
path" from the watch list of open folders to the user. This back path enhances the lookup 
process, making it extremely fast. In one embodiment, the names of the folders are 
stored in hash tables with the output being a set of socket readers or socket writers that 
correspoiid to that particular user. Once this set is determined, an XML notification 
message may be constructed and queued for the corresponding socket writer. 

[0154] File management system 500 also includes an RMI interface 582 that operates in 
a manner similar to socket manager 580, the difference being no XML in the RMI 
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procedure call. In one embodiment, socket manager 580 and RMI interface 582 share 
common code (i.e., code exclusive of XML par$ing etc.) refeired to herein as core calls 
584. Core calls 582 correspond to the comnion.operations between the RMI interface 
and the XML interface. 

[0155] Other functions that may be included in various embodiments of file inanagernent 
system 500 may include logging, unit testing, miscellaneous utilities, etc. These 
functions are generally well known and may either be incorporated into the system or • 
integrated therewith as third party tools. 

[0156] Another function that may be included in file management systern 500 is an ID 
number manager (not illustrated): All file systeni node objects 530, iiiQluding slots 532, ' 
entry objects 534, streams 538 and containers 537, have associated therewith an ID ' 
number. This ID number is unique on a per- volume basis. In some embodiments of the - 
invention, the ID number is used to" name the underlying blob on file system 508 that : 
corresponds to this iiode object. As described above, each stream object 538 refer$ to a 
blob on files system 508 that conresponds to that streani, and the name of that, blob is the 
: ID number of that object. 

[0157] in some embodiments of the invention, ID numbers may be used to look up 
• objects by their number, for example, with the fi-ee-text search index. When a file is 
indexed in the free-text search sense, its file nanae is not stored in the index. Otherwise, 
any time the file is renamed, it would have to be rcrindexed. Instead, the ID number is 
used as the name of the index. When a lookup is performed during a free-text search, the 
returned hits include the ID numbers corresponding to the objects that were foupd. This 
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E) number is usfed to deteraiine which stream objects and accordingly, which entry 
objects and which slot objects are implicated. From the slot objects, the name of the 
object can be determined. Using ID numbers in the iiidex also facilitates a single index 
file regardless of whether the corresponding file is linked, live copied, a deferred copy^ 
etc., as only one instance of that file resides on the disk and thus having multiple index 
files is unwarranted. 

[0158] ID number manager assigns the ID numbers, According to one aspect of the 
invention, ID numbers are anchored in vplunie object 525, Because of the manner in 
which obj ect store 520 operates, if each session were to access the volume obj ect for a 
new ID number as the objects were created, a significant number of write/write collisions, 
against the volume object wpuld result. Instead, ID number manager operates using a 
single thread to assign the ID n^ 

[0159] At start up, ID number manager requests a block of ID numbers from the volume 
object and places them one at a time onto a synchronized queue. While this queue is not 
persistent, the volume nutnber update process is. More particularly, when the ID number 
manager asks for a block of JD numbers, that request is done in a persistent fashion: the 
updated vpliime object is written back to the object store so that the block that was 
requested is "remembered" if the file nianagement system 500 were to crash. However, 
the queue in which these objects are placed is not persistent, Instead, the ID number 
manager writes only so many of the ID numbers, one at a time, to the synchroiiized 
queue. Thus, this queue has a limited depth. Furthermore, the ID number manager only 
has a limited number of these objects that it originally fetched fi*pm the volume object. 
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[0160] In some embodiments, the ID number manager writes a few of these ID numbers 
into this queue and suspends until another thread removes a number from the queue. 
Threads requesting an ID nuniber in order to create file system objects remove a number 
from the queue. In order to overcome problems associated with this queue being non- 
persistent, when the ID number manager has placed all of the ID numbers that it fetched 
from the volume manager on the queue, the ID number maiiager req\iests another block 
of ID numbers through an object store transaction. In this way, the volume object need 
only periodically re-per$ist to disk (/.e., update object store) based on the number of ID 
numbers fetched at any given time frorn the vplume object. ' 

[0161] The tag volume is now described in further detail. As implemented in one 
embodiment of the invention, tag volume is implemented ^s a tag folder hierarchy. As 
described above, tags in file management system 500 are reflected into file system as 
folder names. This is, done be replacing the dots in a tag name with slashes, and then 
appending the resulting string to the root path of the t^g volume. For example, with a tag 
volume root path of "/volume root/tags/" then a tag refeired to as "sys.types" would be 
reflected in the file system as a folder named "/volume root/tags/sys/types." 
Furthermore, the folders corresponding to each tag are created at the time that the tags are 
first created 

[0162] As also described above, each tag can have one or more metatags applied to it. 
One purpose of the metatags is to affect the behavior of the tags to which they are 
applied, These metatags are now described in further detail. 
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f 01 63 J Each tag may include a type that is enforced at the time that the tag is set. One 
type of tag is a user type. A tag of user type has a value of the form of domain name/user 
name. Another type of tag is a date type. A tag of date type has an liSO standard date 
form: Another type of tag is an icon type. A tag of icon type must include a value that 
represents the name of an icon file found in the /volume root/tags folder. Another type pf 
tag is a hash type. A tag of hash type has a form of a SSrcharacter long string (for 
encoded representation of SHA-1 hash code). Another type of tag is a trigger type. A 
trigger is the name of a Java class that will be verified to ensure sure that it exists, and 
that it is derived frorn the right subclass type to be a valid trigger. Another type of tag is 
a boolean type. A tag of boolean type can only be set to true or false. Other values are 
not allowed. Another type of tag is an email type, A tag of email type must include a 
properly formatted e-mail address including a user name and host name. Another type of 
tag is a; password type. A tag of password type has the form of any string, but with the 
property of returning a string of asterisks (for ex;^mple) rather than its exact value when 
the tag is read. Other tags types rnay exist as would be apparent. 

[0164] 'Another metatiaig that is enforced on the volume manager is one that allows new 
values to be set. This nietatag will not allow new values to be created for that tag. 
Another metatag records all current and past values for a particular tag. Whenever a new 
tag value is set to particular tag name, this metatag, referred to as "tag.values" is updated 
so that it includes a current list pf all the values that have ever been applied to that 
particular tag. This allows users to determine, by browsing the tag volume, which pf the 
values of the tags are actually being used. Tags may also include a default value so that 
when the tag is set the default is used if no other value is provided. An owner of the tag 
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may also be specified. This may be used to limit who can add, modify, delete, view, etc,, 
certain tags. 

[0165] Tags may be assigned to a tag group, for example, by setting the "tag.group" 
metatag. Tags that have the same value for the "tag.group" metatag are considered to 
belong to the same tag group. When a single tag that belongs to a particular tag group is 
applied to a file, all of the other tags in that same tag group are also applied to that file/ 
Similarly, when a tag belonging to a particular tag group is deleted from a file; all of the 
oth(sr tags in that tag group are also deleted. Tags in tag groups are intended to be applied 
and removed together. In some embodiments, if one tag in a tag group is changed and if > 
any tag in the tag group has a trigger associated with it, the trigger will fire (whereas 
normally only the trigger associated with the tag that is ch 

[0166] In some embodiments of the invention, a metatag of type trigger may be assigned 
to a tag in the tag folder hierarchy. As described above, this coirespbnds to a Java class 
that gets invoked at various points in the operation of file management system 500. For 
example, triggers may be attached to file operation including opening, closing, reading, 
and/or writing of a file. Triggers may also be attached to metadata operations including 
changing a tag or changing an attribute. In addition, periodic triggers may be invoked as 
would be apparent, without touching the system in any other way. triggers may perform 
any number of operations including sending an e-mail, setting various tags, performing 
file operations, writing out to a log file, creating a new file based on some event, 
adjusting and/or modifying file attributes, freezing a file, etc., or any other operation that 
could be programmed using for example, Java code. 
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[0167] An example of a trigger is now described. One type of trigger contemplated by 
the invention is referred to as an approval trigger. The approval trigger is set up to fire !, 
whenever any approval-related tag changes. The approval trigger sets several approval 
status tags to indicate. who has approved a file and who has not, including the various 
icon designations. And these tags are then later interpreted by the user interface. This is 
aU done based on a list of required a|)prpver The 
approval trigger may also send an e-mail if so designated by a tag attached to the file or 
'. metatag that attached to one of the tags. The approval tag may also freeze the file if all of 
the apprpvers have approved the file 

[0168] File management system 500 mMiages a set of approval-based triggers. In some 
embodiments, this set of triggers is managed on -a userrrby-user basis, so these tags may 
all include the security authentication domain and user name of the user wh6 approved 
the file. For example, dne tag associated with the approval niight correspond to a date tag 
with the name "sys.signature domain.user.date." According to the invention, these tags 
are applied through a signature XML or RMI call rather than directly by the user. This 
ensures that a formal approval proce$s is followed, that certain requirements have been 
nniet, that the users have been authenticated, etc. 

[0169] One embodiment of the invention implements four approval-based tags. These 
include a date tag, a hash code tag associated with the file, a status of the approval (for 
example, "signed" or "rejected"), and the approver's comments relating to their approval 
or rejection. 
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[0170] In addition to the approval-based tags, this embodiment may also include a set of 
tags used to control whether other tags (such as the appro vaUbased tags) are required on 
all the files that go into a folder. By setting these tags on a folder, then every time a file 
is created or moved in that folder, file memagenient system 500 will require that the other 
tags are set; if not, the create or rnove operation will not be allowed. 

[0171] Another mechanism exists in file nlanagement system 500 similar to the tag 
volume described above. This mechanism is referred to as a user volume or a user folder 
hierarchy. As with the tag volume, all users of file management system 500 are reflected 
into the file system as a directory of their corresponding user IDs. For a user "rick'' 
in domain "grokJcer,'' there would be a folder in file system 530 named "/volume 
root/users/grokker/rick.'' As described above, any number of tags can be attached to that 
folder to in effect describe that user. For example, these tags could include a human- 
friendly user name including a first nanie and a last name, an e-mail address, a password, 
a preferred language, as well as authentication tokens and pointers to authentication 
servers, etc. This folder may be linked to other folders thereby designating groups or 
roles, for permission and access p 

[01 72] File management system 500 as thus descHbed provides a framework for 
implementing various aspects of the invention that will now be described. The first of 
these aspects is "live copy" and "smart links." As described above, any file in file system 
530 has associated with it a slot 532, an entry 534, and a stream 538. When a liye copy 
or smart link command is issued with respect to this file^ the file system creates a second 
slot 532 that points to the existing entry 534, and thus the same stream 538. As has been 
described above, slots 532 include name information and entries 534 manage tags, and 
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further, multiple slots 532 can point to a single entry 534. Thus; after the second slot is 
created, the file systern, in effect, manages two names for the same underlying object. 
The live copy comrnand also attaches a trigger to the second slot. This trigger is fired 
when the file is opened or closed, and manages the synchronization with remote systems. 

[0173] A similar mechanisrn may also be used for stnait caching and smart backup. A 
cache or backup trigger is attached to a file so that when the file is opened or closed, the 
trigger can access a remote cache, synchronize a local copy, or in the case of a backup, 
sbnd the modified file off to a backup store. 

[01 74] Deferred copies are implemented using a slot and entry pair. The file system 
pemiits more than one ^lot-entry pair to point tp the same Underlying item 536. As 
described above, the slot manages the name (so the underlying item can have rnultiple 
names) and the entory manages the tags (implying that the underlying item can have 
different sets of tags). The deferred copy command creates a second slot-entry pair 
pointing to the same underlying item. The deferredxopy provides extremely fast server 
side copies of an itein because the underlying item (including its associated blob, in the 
case of a stream) is not copied. When the underlying item is opened for writing or 
modification, the volume manager detects the multiple entries pointing to the same item 
and only then is a copy of the underlying item made. At that time, the second slot-entry 
pair is adjusted to point at the copy as would be app^^ 

[0175] Identical files are detected using the hash code described above. Whenever a file 
is modified and closed, a background thread calculates a new hash code for that file. The 
new hash code is stored in a tag associated with that file. This causes, through a trigger 
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mechanism, file management system 500 to compare the new hash code with the hash 
codes of other files in the system to identify identical files in the file system. According 
to one embodiment, the file system objects, namely the slot^entry pairs are rearranged to 
resemble a deferred copy, and the duplicate blob is removed from disk. Identical files are 
thus combined thereby freeing disk space, 

[01 76/ Frozen files are implemented by attaching a frozen attribute as a boolean field to 
an entry object associated with the file. Whenever this file is opened, this field is 
examined to detennine the allowed operations. Nothing happens if the file is opened for 
reading. However, if the file is opened for writing or creating an error will be thrown and 
that operation will be prevented. In some embodiments, this field may also be examined 
wheri tags are set so that tags on a frozen file cannot be niodifiedj added, deleted, etc. In 
one embodiment of the invention, a frozen file is sdcin to a permanerit read only file, 
including its tags. In various embodiments of the invention, the only operations allowed 
on a frozen file are reading and renaming. \ 

[0177] Query folders are implemented through query tags attached to the folder. Query 
tags differ froni other tags described above in that they can only be attached to empty 
folders. When these tags are set, special links are made to all of the files that niatch the 
query, These links ^e updated when either the query tags change or when one of the 
files matching the query changes. 

[01 78] Search folders are implemented in a similar fashion; however, instead of 
performing a search using the tag mechanism described above, the search folder utilizes a 
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free-text search engine. As described above, the search engine returns the file ID based 
on a provided search string and the file ID is used to get the file name, 

[01 79] File versions are created automatically, either when a user does a file create on ^ 
top of an existing file, or when file management system 500 detects a renaming sequence. 
For exaniple, Microsoifl Word uses a renaming sequence that renames the original file to 
a backup file and then renames a temporary file to the name of the original file, The file 
system implemeiits and manages versions by maintaining a linked list gf entries with 
various state bits that control whether or not those entries are shown in directories when 
the directories are enumerated, When the directory is enumerated, the file systein uses 
these state bits to determine which versions to display based on, for example, user 
preferences. ;In one embodiment, older versions of files have an ISO standard date 
encoded into their names for use and discriminatioti by other systems, along with the 
word "version". This encoding also avoids name collisions as would happen, for 
example, if all the versions had the same name as the original file. In sonie 
embodiments, automatically^created versions can also be renamed with a name chosen by 
the user. 

[0180] Copy pedigrees are also implemented by file management system 500. When 
copies are created using, for example, a server side copy command, the server tracks 
these copy operations by having each entry object forward point to a collection of other 
entries that are copies thereof Likewise, each entry object piay also backward point to 
the entry from which it was copied. File management system 500 responds to 
appropriate XML and RMI commands to present these copies pedigrees in a user 
interface in ah appropriate form to illustrate the migration of copies from place to place. 
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[0181] Undeleting jQles is implemented as set forth below. As files are deleted, their 
corresponding slot objects are renamed and a field in the slot object is set to indicate that 
the slot has been deleted. When a directory is enumerated, deleted slots are not shown. 
This process is reversed when a file is undeleted. The field in the slot is unset and the 
nanie is changed back to its original value. In an analogous way to versions, deleted 
filenaraes.ajre marked with the string "deleted" and the date that the file was deleted. 
When these files are undeleted, their names are marked with the string "undeleted" and 
the date that they werie undeleted. File nianagement system 500 responds to an 
appropriate XML or RMI command to toggle a per-user boolean value, managed in 
container 537, which in turn controls whether the deleted files are shown when the 
corresponding user enumerates the container. With this field enabled, users can see 
deleted files in the same context where they were originally located. 

[0182] Type folders are implemented with a special tag on the folder that file 
nianagement system 500 examines prior to allowing a file to be added there. If the file 
does not match the specified type, the system will not allow the file to be placed in that 
folder. 
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